Most of the time when you use R, you use a functional programming style(i.e. you start with some data, to manipulate it you apply a function. This returns some new data then you apply another function, and you repeat this until you get an answer.).
With a functional mindset you typically start with thinking about what you want a function to do, then you worry about the objects that get passed to the functions(i.e. arguments).Finally you worry about objects that come out the other end i.e. return values.
Object Oriented Programming has a different approach. In it you start by thinking about objects that you have to work with(ex. Teapot), Then you think about what data you need to describe the object(ex - you might consider the toal capacity of the teapot and how much liquid is currently stored in the teapot). Next you think about the functionality of that object(ex. the main purpose of the teapot is to pour tea so you add pour function)
In OOP functions are known as methods.(basically methods are functions in Object Oriented context). There are two variable types that are important in OOP.
Because these variable types can contain many other variables (& types) you can use them to create many other more complex types. Functional Programming approach is most preferred one for data analysis.
It works best when you have limited number of objects that you completely understand behavior of. So OOP is preferred/good for building tools that are used for data analysis but bad for data analysis itself.One of the principles of OOP is that functions can behave differently for different kinds of object(ex. summary function).
# Create these variables
a_numeric_vector <- rlnorm(50)
a_factor <- factor(
sample(c(LETTERS[1:5], NA), 50, replace = TRUE)
)
a_data_frame <- data.frame(
n = a_numeric_vector,
f = a_factor
)
a_linear_model <- lm(dist ~ speed, cars)
# Call summary() on the numeric vector
summary(a_numeric_vector)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1103 0.5207 0.9501 1.3263 2.0918 4.6469
## A B C D E NA's
## 10 7 9 6 7 11
## n f
## Min. :0.1103 A :10
## 1st Qu.:0.5207 B : 7
## Median :0.9501 C : 9
## Mean :1.3263 D : 6
## 3rd Qu.:2.0918 E : 7
## Max. :4.6469 NA's:11
##
## Call:
## lm(formula = dist ~ speed, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.069 -9.525 -2.272 9.215 43.201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.5791 6.7584 -2.601 0.0123 *
## speed 3.9324 0.4155 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
Out of the nine packages availabale for OOP in R not all are used due to some limitaions. Only two packages are of real use which are S3 and R6.
str() and class() are the functions used to examine the structure and class of the variable respectively. But sometimes however you need to dig deeper and there are other functions that you need to consider.
Both the functions mode and storage.mode exists solely for their compatibility with the older S code so while you need to know they exists you should never really use them.
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12
## [,1] [,2] [,3] [,4]
## [1,] 0.74181895 1.0518520 -0.9652027 -1.0421293
## [2,] 0.04219933 -1.1993195 1.5072984 0.2748198
## [3,] 2.37485335 -0.2832735 0.8662924 0.6323595
## [1] "matrix"
## [1] "matrix"
## [1] "integer"
## [1] "double"
There are some rarer variable types that you may not have come across yet.
Also note that there are three kinds of functions in R.
Most of the functions that you come across are called closures. A few important functions, like length() are known as builtin functions, which use a special evaluation mechanism to make them go faster. Language constructs, like if and while are also functions! They are known as special functions.
# Create a function
type_info <- function(x)
{
c(
class = class(x),
typeof = typeof(x),
mode = mode(x),
storage.mode = storage.mode(x)
)
}
# Create list of example variables
some_vars <- list(
an_integer_vector = rpois(24, lambda = 5),
a_numeric_vector = rbeta(24, shape1 = 1, shape2 = 1),
an_integer_array = array(rbinom(24, size = 8, prob = 0.5), dim = c(2, 3, 4)),
a_numeric_array = array(rweibull(24, shape = 1, scale = 1), dim = c(2, 3, 4)),
a_data_frame = data.frame(int = rgeom(24, prob = 0.5), num = runif(24)),
a_factor = factor(month.abb),
a_formula = y ~ x,
a_closure_function = mean,
a_builtin_function = length,
a_special_function = `if`
)
# Loop over some_vars calling type_info() on each element to explore them
lapply(some_vars,type_info)
## $an_integer_vector
## class typeof mode storage.mode
## "integer" "integer" "numeric" "integer"
##
## $a_numeric_vector
## class typeof mode storage.mode
## "numeric" "double" "numeric" "double"
##
## $an_integer_array
## class typeof mode storage.mode
## "array" "integer" "numeric" "integer"
##
## $a_numeric_array
## class typeof mode storage.mode
## "array" "double" "numeric" "double"
##
## $a_data_frame
## class typeof mode storage.mode
## "data.frame" "list" "list" "list"
##
## $a_factor
## class typeof mode storage.mode
## "factor" "integer" "numeric" "integer"
##
## $a_formula
## class typeof mode storage.mode
## "formula" "language" "call" "language"
##
## $a_closure_function
## class typeof mode storage.mode
## "function" "closure" "function" "function"
##
## $a_builtin_function
## class typeof mode storage.mode
## "function" "builtin" "function" "function"
##
## $a_special_function
## class typeof mode storage.mode
## "function" "special" "function" "function"
Class() function can also be used to override the class of an object along with retrieving the class of the object and without breaking the existing functionality.
Note: Overriding the class doesn’t change the type(), mode(), or storage.mode() of the object(because this is fundamental property of an R object).
In the below example you can see that class overrides the class of an object and not the type of the object.
## [1] 0.4714675 1.8163304 0.3964557 0.5707266 0.1212334 3.4097801 0.5029485
## [8] 0.8229876 0.2082152 1.1661194
## [1] "double"
## [1] "random_numbers"
Previously we saw how summary function behaved differently based on the input parameter/argument type. Having different behaviors for functions under different kinds of input is called as function Overloading(input dependent function behavior). Main purpose is to simplify your code (you might had to learn more functions)
The S3 systems exists entirely to solve this problem. It does this by splitting the function into two parts
## function (x, ...)
## UseMethod("print")
## <bytecode: 0x0000000011f81c38>
## <environment: namespace:base>
As you can see above print() function is really very simple it is only one line long. This is very typical with S3 generic. All the function needs to do is call UseMethod, with its own Name. That is print is passed to UseMethod as a string.
There are two conditions you must follow for S3 methods
In the below example the arguments to print are x and ellipsis where as arguments to print.Date are the arguments to the generic with an extra MAX argument
## function (x, ...)
## NULL
## function (x, max = NULL, ...)
## NULL
The ellipsis argument allows arguments to be passed from one method to another. It is good practise to include an ellipsis argument in both the generic and the methods. All the methods corresponding to generic are completely independent. In the below example you can see that print.function and print.Date are completely unrelated. Becuase S3 requires a dot to separate the name of the generic and the class of the input it is a bad idea to include a dot in the name of your variables. Variable names separated by dots are sometimes are called the leopard case. Don’t use this naming convention. Better conventions are lower_snake_case where lower case words are separted by underscores or lowerCamelCase where first word is lower case and subsequent words start with a capital letter.
## function (x, useSource = TRUE, ...)
## .Internal(print.function(x, useSource, ...))
## <bytecode: 0x0000000018673f40>
## <environment: namespace:base>
## function (x, max = NULL, ...)
## {
## if (is.null(max))
## max <- getOption("max.print", 9999L)
## if (max < length(x)) {
## print(format(x[seq_len(max)]), max = max + 1, ...)
## cat(" [ reached 'max' / getOption(\"max.print\") -- omitted",
## length(x) - max, "entries ]\n")
## }
## else if (length(x))
## print(format(x), max = max, ...)
## else cat(class(x)[1L], "of length 0\n")
## invisible(x)
## }
## <bytecode: 0x0000000018889a60>
## <environment: namespace:base>
What’s in a Name? S3 uses a strict naming convention: all S3 methods have a name of the form generic.class.
The converse is not true: a function can have a name containing a dot without being an S3 method. This is the case with many of the functions that have been around since the early days of the S language. For example, all.equal() is actually an S3 generic, not a method. (This is an example of how leopard.case can be confusing.)
You can check if a function is an S3 generic by calling is_s3_generic() from the pryr package. You can also print it (by typing its name in the console), then looking to see if it calls UseMethod().
Similarly, you can check if a function is an S3 method by calling is_s3_method() from pryr. For example,
## [1] TRUE
## [1] TRUE
## [1] FALSE
Creating a Generic Function You can create your own S3 functions. The first step is to write the generic. This is typically a single line function that calls UseMethod(), passing its name as a string.
The first argument to an S3 generic is usually called x, though this isn’t compulsory. It is also good practice to include a … (“ellipsis”, or “dot-dot-dot”) argument, in case arguments need to be passed from one method to another.
Overall, the structure of an S3 generic looks like this.
an_s3_generic <- function(x, maybe = "some", other = "arguments", ...) {
UseMethod("an_s3_generic")
}
Creating an S3 Method
By itself, the generic function doesn’t do anything. For that, you need to create methods, which are just regular functions with two conditions:
The name of the method must be of the form generic.class. The method signature - that is, the arguments that are passed in to the method - must contain the signature of the generic.
The syntax is:
## function(x, ...)
## {
## UseMethod("get_n_elements")
## }
# Create a data.frame method for get_n_elements
get_n_elements.data.frame <- function(x, ...)
{
nrow(x) * ncol(x) # or prod(dim(x))
}
# Call the method on the sleep dataset
n_elements_sleep <- get_n_elements(sleep)
# View the result
n_elements_sleep
## [1] 60
Creating an S3 method (2) If no suitable method is found for a generic, then an error is thrown. For example, at the moment, get_n_elements() only has a method available for data.frames. If you pass a matrix to get_n_elements() instead, you’ll see an error.
## Error in UseMethod("get_n_elements"): no applicable method for 'get_n_elements' applied to an object of class "c('matrix', 'logical')"
Rather than having to write dozens of methods for every kind of input, you can create a method that handles all types that don’t have a specific method. This is called the default method; it always has the name generic.default. For example, print.default() will print any type of object that doesn’t have its own print() method.
## a_data_frame : 'data.frame': 50 obs. of 2 variables:
## $ n: num 0.896 0.199 2.845 0.508 4.474 ...
## $ f: Factor w/ 5 levels "A","B","C","D",..: 2 NA NA 4 1 4 NA 5 2 4 ...
## a_factor : Factor w/ 5 levels "A","B","C","D",..: 2 NA NA 4 1 4 NA 5 2 4 ...
## a_linear_model : List of 12
## $ coefficients : Named num [1:2] -17.58 3.93
## $ residuals : Named num [1:50] 3.85 11.85 -5.95 12.05 2.12 ...
## $ effects : Named num [1:50] -303.914 145.552 -8.115 9.885 0.194 ...
## $ rank : int 2
## $ fitted.values: Named num [1:50] -1.85 -1.85 9.95 9.95 13.88 ...
## $ assign : int [1:2] 0 1
## $ qr :List of 5
## $ df.residual : int 48
## $ xlevels : Named list()
## $ call : language lm(formula = dist ~ speed, data = cars)
## $ terms :Classes 'terms', 'formula' language dist ~ speed
## $ model :'data.frame': 50 obs. of 2 variables:
## a_numeric_vector : num [1:50] 0.896 0.199 2.845 0.508 4.474 ...
## get_n_elements : function (x, ...)
## get_n_elements.data.frame : function (x, ...)
## int_mat : int [1:3, 1:4] 1 2 3 4 5 6 7 8 9 10 ...
## n_elements_sleep : int 60
## num_mat : num [1:3, 1:4] 0.7418 0.0422 2.3749 1.0519 -1.1993 ...
## some_vars : List of 10
## $ an_integer_vector : int [1:24] 1 8 3 9 6 7 2 3 6 8 ...
## $ a_numeric_vector : num [1:24] 0.2128 0.0198 0.9113 0.9216 0.2297 ...
## $ an_integer_array : int [1:2, 1:3, 1:4] 5 4 5 5 3 6 6 5 0 4 ...
## $ a_numeric_array : num [1:2, 1:3, 1:4] 1.2425 0.0126 1.6071 0.299 1.1228 ...
## $ a_data_frame :'data.frame': 24 obs. of 2 variables:
## $ a_factor : Factor w/ 12 levels "Apr","Aug","Dec",..: 5 4 8 1 9 7 6 2 12 11 ...
## $ a_formula :Class 'formula' language y ~ x
## $ a_closure_function:function (x, ...)
## $ a_builtin_function:function (x)
## $ a_special_function:.Primitive("if")
## type_info : function (x)
## x : 'random_numbers' num [1:10] 0.471 1.816 0.396 0.571 0.121 ...
There are lot of s3 functions in R and now you are going to leanr how to find out what is available. When you have a generic function in R it is often useful to know which methods are available for that generic. To answer this you can use the methods() function. To use it you pass the function or a string naming that function.
## [1] mean.Date mean.default mean.difftime mean.POSIXct mean.POSIXlt
## [6] mean.quosure*
## see '?methods' for accessing help and source code
What methods are availabe for a given class of an object?. You can find out even this using the methods function using the class argument(with or wthout the quotes)
## [1] add1 anova coerce confint cooks.distance
## [6] deviance drop1 effects extractAIC family
## [11] formula influence initialize logLik model.frame
## [16] nobs predict print residuals rstandard
## [21] rstudent show slotsFromS3 summary vcov
## [26] weights
## see '?methods' for accessing help and source code
Actually methods is more generous with its return value than giving just the S3 methods for a given generic or class. It will return both S3 methods and S4 methods. To find only the S3 methods for a given generic or class use .S3methods function and for s4 use .S4methods.
## [1] add1 anova confint cooks.distance deviance
## [6] drop1 effects extractAIC family formula
## [11] influence logLik model.frame nobs predict
## [16] print residuals rstandard rstudent summary
## [21] vcov weights
## see '?methods' for accessing help and source code
## [1] coerce initialize show slotsFromS3
## see '?methods' for accessing help and source code
For Many data analysis the time consuming tasks are
This means that R is optimized to make these tasks as quick as possible. In some cases however the speed of the code is more important.
Functions for whom speed is a critical factor aren’t actually written in R, instead they are written in C. The reason for this is that C code typically runs faster than R code so writing in C increases peroformance. The tradeoff is that C code is longer to write and harder to debug.
R has several interfaces to the C language and the highest performance of these is known as the primitive interface. This is reserved for few fundamental features in Base R. Functions that use the primitive interface are called as Primitive Functions(ex.exp, sin, +, -, for, if).
Primitive functions can also be generic and it is important to note that these behave slightly different than other generic functions. You can see the complet list of primitvie S3 generics using .S3PrimitiveGenerics(30 functions). The big difference between primitive generic and regular generic is what happens when a sutiable method can’t be found.
## [1] "anyNA" "as.character" "as.complex" "as.double"
## [5] "as.environment" "as.integer" "as.logical" "as.numeric"
## [9] "as.raw" "c" "dim" "dim<-"
## [13] "dimnames" "dimnames<-" "is.array" "is.finite"
## [17] "is.infinite" "is.matrix" "is.na" "is.nan"
## [21] "is.numeric" "length" "length<-" "levels<-"
## [25] "names" "names<-" "rep" "seq.int"
## [29] "xtfrm"
## [1] "1970-01-01" "2012-12-21"
## Error in as.Date.default(all_of_time): do not know how to convert 'all_of_time' to class "Date"
## [1] 2
As as.Date is not primitive generic, when you override the class to date_strings no method can be found and an error is thrown. By contrast look at what happens with length function. Length is primitive generic its so important that it shouldn’t break just because the class has changed.
For primitive functions rather than throwing an error when no suitable method is found those functions will directly go directly to C code using typeOf to determine the type of variable/input.
Variables can have more than one class. In this case ratherthan class being a single string it is a character vector. In the example below the vector if numbers is described using three or more classes. The order of the class is important. The most specific class is first and gradually get less specific as you move from left to right. It is good practise to keep original class as the final class(i.e. numeric).
To test for arbitary classes you can use the general purpose inherits function. As you can see in below example x inherits from triangular_numbers, and from natural_numbers and from numeric.
x <- c(1, 3, 6, 10, 15)
class(x) <- c("triangular_numbers","natural_numbers","numeric")
is.numeric(x)
## [1] TRUE
## Error in is.triangular_numbers(x): could not find function "is.triangular_numbers"
## [1] TRUE
## [1] TRUE
# will return the same thing as calling is.numeric but the more general function is much slower. For this reason you should use the specific
# function if available
inherits(x,"numeric")
## [1] TRUE
If your object has multiple classes then you can call multiple S3 methods using NextMethod function.
what_am_i <- function(x, ...){
UseMethod("what_am_i")
}
what_am_i.triangular_numbers <- function(x, ...){
message("I'm triangular numbers")
NextMethod("what_am_i")
}
what_am_i.natural_numbers <- function(x, ...){
message("I'm natural numbers")
NextMethod("what_am_i")
}
what_am_i.numeric <- function(x, ...){
message("I'm numeric")
}
what_am_i(x)
## I'm triangular numbers
## I'm natural numbers
## I'm numeric
The R6 system provides a way of storing data and objects within the same variable.
The first step in working with R6 is to create a class generator for each of your objects. A class generator is a template that describes what data can be stored in the object and what functions can be applied to the object. It is also used to create the specified objects. For this reason class generators are called as factories.
Factories are defined using R6Class function. The first argument to the R6 Class is the name of the class. By convention this should be in UpperCamelCase. The second arument is called private which stores object’s data. It is always a list and each of the elements of the list must be named. There are two more arguments public and active which will be discussed later.
The second step to working with R6 is to create some objects. You can do this by calling the new() method of the factory. Since it is a factory you can churn out as many objects as you like.
In OOP the separating the implementation of the object from its user interface is called Encapsulation. In R6 all the implementation details are stored in the private element of the class. By contrast the user interface details are stored in the element public.
The public element is also specified as a named list and its content are mostly functions.
The data fields in the private elements can be accessed using the prefix private$.
In example below private field door_is_open is accessed in the function open_door using private$door_is_open.
It is also possible to access other public elements of a class using the self$ prefix or (…).
# Define microwave_oven_factory
microwave_oven_factory <- R6Class(
"MicrowaveOven"
,private = list(
power_rating_watts = 800
,door_is_open = FALSE
)
,public = list(
open_door = function(){
private$door_is_open <- TRUE
}
,close_door = function() {
private$door_is_open <- FALSE
}
,cook = function(time_seconds){
Sys.sleep(time_seconds)
print("Your food is cooked!")
}
)
)
# Create microwave oven object
a_microwave_oven <- microwave_oven_factory$new()
# Call cook method for 1 second
a_microwave_oven$cook(1)
## [1] "Your food is cooked!"
There is one special public method named initialize() (note the American English spelling). This is not called directly by the user. Instead, it is called automatically when an object is created; that is, when the user calls new().
initialize() lets you set the values of the private fields when you create an R6 object. The pattern for an initialize() function is as follows:
thing_factory <- R6Class(
"Thing",
private = list(
a_field = "a value",
another_field = 123
),
public = list(
initialize = function(a_field, another_field) {
if(!missing(a_field)) {
private$a_field <- a_field
}
if(!missing(another_field)) {
private$another_field <- another_field
}
}
)
)
Notice the use of missing(). This returns TRUE if an argument wasn’t passed in the function call.
Arguments to the factory’s new() method are passed to initialize().
# Add an initialize method
microwave_oven_factory <- R6Class(
"MicrowaveOven",
private = list(
power_rating_watts = 800,
door_is_open = FALSE
),
public = list(
cook = function(time_seconds) {
Sys.sleep(time_seconds)
print("Your food is cooked!")
},
open_door = function() {
private$door_is_open <- TRUE
},
close_door = function() {
private$door_is_open <- FALSE
},
# Add initialize() method here
initialize = function(power_rating_watts, door_is_open) {
if(!missing(power_rating_watts)) {
private$power_rating_watts <- power_rating_watts
}
if(!missing(door_is_open)) {
private$door_is_open <- door_is_open
}
}
)
)
# Make a microwave
a_microwave_oven <- microwave_oven_factory$new(
power_rating_watts = 650,
door_is_open = TRUE
)
Data values stored in the private element of an R6 class are not directly acessible by the user. However sometimes you may wish to provide controlled access to these data fields. There are two access cases you may want to retrieve the data field or you may want to change it. In OOP this is known as Getting the data or Setting the data.
In R6 this controlled access to private fields is achieved through Active Bindings. Active Bindings are defined like functions but are accessed like data variables.
Active Bindings are added to the active element of a class. The active element must be a named list. One of the R6 restrictions is that elements of private, public and active must all have different names.
A useful convention to distinguish private and active elements is to start all private fields with a double dot. For you as a programmer this makes the private field stand out so you have a quick visual way of signifying that these variables are not available for consumption by user.
The simplest case is to create a read only active binding. That means that you only want to retrieve a data field rather being able to change it.In this case the function takes no arguemnt and you can simply return the corresponding private field. In the example below the active binding a_field returns the private field ..a_field
Since the a_field binding is a function you can apply/include custom logic. For example if the data field was missing you can return a default value.
##
## Attaching package: 'assertive'
## The following objects are masked from 'package:pryr':
##
## is_s3_generic, is_s3_method
thing_factory <- R6Class(
"Thing",
private = list(
..a_field = "a value",
..another_field = 123
),
active = list(
a_field = function(){
if(is.na(private$..a_field)){
return("a missing value")
}
private$..a_field
}
,another_field = function(value){
if(missing(value)){
private$..another_field
} else {
assert_is_a_number(value)
private$..another_field <- value
}
}
)
)
A more complex case is when you want the users to be able to change the value of data field as well. In this case the bidning function should take a single argument, by convention named value. If value is missing the function just returns the private data field as before. However when value is passed to the active binding you need some logic to set the private value
The purpose of active bindings is to allow controlled access to the private fields. This means that you can add custom logic to check the value before you assign it. For example if another_field should only contain a single number you can use assert_is_a_number from the assertive package to check this condition and throw an error if the value is something else. Notice you are accessing it as a data variable although it is a function(no paranthesis at the end).The active binding is called like a data variable, not a function. Since a_field was defined as read-only variable if you try to change it you will get an error.
By contrast you can set another_field but however the logic in the binding states that value must be a single number.
## [1] "a value"
## Error in (function () : unused argument (.Primitive("quote")("a new value"))
## Error in (function (value) : is_a_number : value is not of class 'numeric'; it has class 'character'.
# Add a binding for power rating
microwave_oven_factory <- R6Class(
"MicrowaveOven",
private = list(
..power_rating_watts = 800
),
active = list(
# Add the binding here
power_rating_watts = function(){
private$..power_rating_watts
}
)
)
# Make a microwave
a_microwave_oven <- microwave_oven_factory$new()
# Get the power rating
a_microwave_oven$power_rating_watts
## [1] 800
# Add a binding for power rating
microwave_oven_factory <- R6Class(
"MicrowaveOven",
private = list(
..power_rating_watts = 800,
..power_level_watts = 800
),
# Add active list containing an active binding
active = list(
power_level_watts = function(value) {
if(missing(value)) {
# Return the private value
private$..power_level_watts
} else {
# Assert that value is a number
assert_is_a_number(value)
# Assert that value is in a closed range from 0 to power rating
assert_all_are_in_closed_range(value,0,private$..power_rating_watts)
# Set the private power level to value
private$..power_level_watts <- value
}
}
)
)
# Make a microwave
a_microwave_oven <- microwave_oven_factory$new()
# Get the power level
a_microwave_oven$power_level_watts
## [1] 800
## Error in (function (value) : is_a_number : value is not of class 'numeric'; it has class 'character'.
## Error in (function (value) : is_in_closed_range : value are not all in the range [0,800].
## There was 1 failure:
## Position Value Cause
## 1 1 1600 too high
Copying and Pasting is really a big source of bugs and usually a sign that you are writing bad code. If you made any changes in the parent class you want those changes to be mirrored in the child class.To implement inheritance, R6 uses inherit argument. The classes that inherits from the original class(parent class) are called as child classes. All the data and the functionality of the parent class is passed to the child class i,e, all the fields from private, public and active elements.You can also add any additioanl functionality to the child.The important thing to remember that inheritance only works in one direction. The parent class does not inherit the traits of its child.
Inheritance means that the methods of the child class are exact copies of those in the parent class and you can add additional methods in the child class.
child_thing_factory <- R6Class(
"ChildThing",
inherit = thing_factory
)
a_thing <- thing_factory$new()
class(a_thing)
## [1] "Thing" "R6"
## [1] TRUE
## [1] TRUE
## [1] "ChildThing" "Thing" "R6"
## [1] TRUE
## [1] TRUE
## [1] TRUE
# Explore the microwave oven class
microwave_oven_factory
# Define a fancy microwave class inheriting from microwave oven
fancy_microwave_oven_factory <- R6Class(
"FancyMicrowaveOven",
inherit = microwave_oven_factory
)
# Explore microwave oven classes
microwave_oven_factory
fancy_microwave_oven_factory
# Instantiate both types of microwave
a_microwave_oven <- microwave_oven_factory$new()
a_fancy_microwave <- fancy_microwave_oven_factory$new()
# Get power rating for each microwave
microwave_power_rating <- a_microwave_oven$power_rating_watts
fancy_microwave_power_rating <- a_fancy_microwave$power_rating_watts
# Verify that these are the same
identical(microwave_power_rating, fancy_microwave_power_rating)
# Cook with each microwave
a_microwave_oven$cook(1)
a_fancy_microwave$cook(1)
Simply creating a new class that inherits from another class isn’t useful by itself. What you really want the child class to do is add new functionality.
This can be done in two ways
To override the functionality you define elements with the same name as those in the parent. To extend the functionality you simply define new public methods or private data fields.
Public methods can call other public methods by prefixing their name with self$.
# Explore microwave oven class
microwave_oven_factory
# Extend the class definition
fancy_microwave_oven_factory <- R6Class(
"FancyMicrowaveOven",
inherit = microwave_oven_factory,
# Add a public list with a cook baked potato method
public = list(
cook_baked_potato = function(){
self$cook(3)
}
)
)
# Instantiate a fancy microwave
a_fancy_microwave <- fancy_microwave_oven_factory$new()
# Call the cook_baked_potato() method
a_fancy_microwave$cook_baked_potato()
Child classes can access public methods from their parent class by prefixing the name with super$.
# Explore microwave oven class
microwave_oven_factory
# Update the class definition
fancy_microwave_oven_factory <- R6Class(
"FancyMicrowaveOven",
inherit = microwave_oven_factory,
# Add a public list with a cook method
public = list(
cook = function(time_seconds){
super$cook(time_seconds)
message("Enjoy your dinner!")
}
)
)
# Instantiate a fancy microwave
a_fancy_microwave <- fancy_microwave_oven_factory$new()
# Call the cook() method
a_fancy_microwave$cook(1)
R6 allows multiple levels of inheritance. But, R6 objects only have access to functionality from their direct parent class. To access functionality across multiple generations the intermediate generations must expose their parents using an active binding. This active binding is conventionally names super_ and simply returns the super object.
thing_factory <- R6Class(
"Thing",
public = list(
do_something = function(){
message("the parent do_something method")
}
)
)
child_thing_factory <- R6Class(
"ChildThing",
inherit = thing_factory,
public = list(
do_something = function(){
message("the child do_something method")
}
),
active = list(
super_ = function() super
)
)
grand_child_thing_factory <- R6Class(
"GrandChildThing",
inherit = child_thing_factory,
public = list(
do_something = function(){
message("the grand-child do_something method")
super$do_something()
super$super_$do_something()
}
)
)
a_grand_child_thing <- grand_child_thing_factory$new()
a_grand_child_thing$do_something()
## the grand-child do_something method
## the child do_something method
## the parent do_something method
# Expose the parent functionality
fancy_microwave_oven_factory <- R6Class(
"FancyMicrowaveOven",
inherit = microwave_oven_factory,
public = list(
cook_baked_potato = function() {
self$cook(3)
},
cook = function(time_seconds) {
super$cook(time_seconds)
message("Enjoy your dinner!")
}
),
# Add an active element with a super_ binding
active = list(
super_ = function() super
)
)
# Instantiate a fancy microwave
a_fancy_microwave <- fancy_microwave_oven_factory$new()
# Call the super_ binding
a_fancy_microwave$super_
# Explore other microwaves
microwave_oven_factory
fancy_microwave_oven_factory
# Define a high-end microwave oven class
high_end_microwave_oven_factory <- R6Class(
"HighEndMicrowaveOven",
inherit = fancy_microwave_oven_factory,
public = list(
cook = function(time_seconds){
super$super_$cook(time_seconds)
message(ascii_pizza_slice)
}
)
)
# Instantiate a high-end microwave oven
a_high_end_microwave <- high_end_microwave_oven_factory$new()
# Use it to cook for one second
a_high_end_microwave$cook(1)
As you saw earlier environments have special copy by reference behavior. Since R6 objects are built using environments, they also use copy by reference.If you create an object then use assignment to copy it, changing a filed in one object changes it for all objects.
thing_factory <- R6Class(
"Thing",
private = list(
..a_field = 123
),
active = list(
a_field = function(value){
if(missing(value)) {
private$..a_field
} else {
private$..a_field <- value
}
}
)
)
a_thing <- thing_factory$new()
a_copy <- a_thing
a_thing$a_field <- 456
a_copy$a_field
## [1] 456
Sometimes this isn’t the behavior that you want, so all R6 objects have a method named clone() to allow independent copies(or copy by value). You don’t need to define this method yourself it will be automatically generated. To copy the object using the more standard copy by value behavior just call the clone method without any arguments.
## [1] 456
One special case is when R6 classes contain other R6 classes.
container_factory <- R6Class(
"Container",
private = list(
..thing = thing_factory$new()
),
active = list(
thing = function(value){
if(missing(value)) {
private$..thing
} else {
private$..thing <- value
}
}
)
)
a_container <- container_factory$new()
a_clone <- a_container$clone()
a_container$thing$a_field <- "a new value"
a_clone$thing$a_field
## [1] "a new value"
To use copy by value for the internal R6 object. You need to call clone with the argument deep = TRUE . Because of this changes to thing$a_field aren’t propogated along to deep_copy. So if an R6 object contains other R6 objects you have to pass argument deep = TRUE to provide copy by value behavior for those fields.
a_deep_clone <- a_container$clone(deep = TRUE)
a_container$thing$a_field <- "a different value"
a_deep_clone$thing$a_field
## [1] "a new value"
# Create a microwave oven
a_microwave_oven <- microwave_oven_factory$new()
# Copy a_microwave_oven using <-
assigned_microwave_oven <- a_microwave_oven
# Copy a_microwave_oven using clone()
cloned_microwave_oven <- a_microwave_oven$clone()
# Change a_microwave_oven's power level
a_microwave_oven$power_level_watts <- 400
# Check a_microwave_oven & assigned_microwave_oven same
identical(a_microwave_oven$power_level_watts, assigned_microwave_oven$power_level_watts)
# Check a_microwave_oven & cloned_microwave_oven different
identical(a_microwave_oven$power_level_watts, cloned_microwave_oven$power_level_watts)
If an R6 object contains another R6 object in one or more of its fields, then by default clone() will copy the R6 fields by reference. To copy those R6 fields by value, the clone() method must be called with the argument deep = TRUE.
# Create a microwave oven
a_microwave_oven <- microwave_oven_factory$new()
# Look at its power plug
a_microwave_oven$power_plug
# Copy a_microwave_oven using clone(), no args
cloned_microwave_oven <- a_microwave_oven$clone()
# Copy a_microwave_oven using clone(), deep = TRUE
deep_cloned_microwave_oven <- a_microwave_oven$clone(deep = TRUE)
# Change a_microwave_oven's power plug type
a_microwave_oven$power_plug$type <- "British"
# Check a_microwave_oven & cloned_microwave power plug types same
identical(a_microwave_oven$power_plug$type, cloned_microwave_oven$power_plug$type)
# Check a_microwave_oven & deep_cloned_microwave power plug types different
identical(a_microwave_oven$power_plug$type, deep_cloned_microwave_oven$power_plug$type)
If an R6 objects connects to a database or a file then it can be dangerous to delete it without making sure that you close the connections first. Similarly, if the R6objects has any side effects such as changing global options or changing global plotting parameters, then it is good practise to return those settings back to their previous state.
initialize method customizes behavior when an object is created(customizes startup). Similarly initialize has a counterpart object named finalize that allows custom behavior when an R6 object is destroyed(custom cleanup).
Finalize is always a function with no arguments defined in the public element of an R6 class. When you delete the object of the R6Class finalize method isn’t called immediately. That happends when the object is garbage collected by R’s Memory management system. You can force this to occur by calling the gc() function.
So in summary it is used for cleanup when objects gets destroyed. Also useful for R6Classes that connect to databases or files since it is important that these connections eventually get closed. Finalized gets called when the object us garbage collected by R.
thing_factory <- R6Class(
"Thing",
private = list(
..a_field = 123
),
public = list(
initialize = function(a_field){
if(!missing(a_field)){
private$a_field = a_field
}
},
finalize = function(){
message("Finalize this thing")
}
)
)
a_thing <- thing_factory$new()
rm(a_thing)
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 626831 33.5 1244284 66.5 1244284 66.5
## Vcells 1221428 9.4 8388608 64.0 2140529 16.4
library(RSQLite)
database_manager_factory <- R6Class(
"DatabaseManager",
private = list(
conn = NULL
),
public = list(
initialize = function(a_field) {
private$conn <- dbConnect("some-database.sqlite")
}
)
)
,
finalize = function() {
dbDisc
# From previous step
smart_microwave_oven_factory <- R6Class(
"SmartMicrowaveOven",
inherit = microwave_oven_factory,
private = list(
conn = NULL
),
public = list(
initialize = function() {
private$conn <- dbConnect(SQLite(), "cooking-times.sqlite")
},
get_cooking_time = function(food) {
dbGetQuery(
private$conn,
sprintf("SELECT time_seconds FROM cooking_times WHERE food = '%s'", food)
)
},
finalize = function() {
message("Disconnecting from the cooking times database.")
dbDisconnect(private$conn)
}
)
)
a_smart_microwave <- smart_microwave_oven_factory$new()
# Remove the smart microwave
rm(a_smart_microwave)
# Force garbage collection
gc()